Corpus-dependent Association Thesauri for Information Retrieval

نویسندگان

  • Hiroyuki Kaji
  • Yasutsugu Morimoto
  • Toshiko Aizono
  • Noriyuki Yamasaki
چکیده

This paper presents a method for automatically generating an association thesaurus from a text corpus, and demonstrates its application to information retrieval. The thesaurus generation method .consists of extracting tenns and co-occurrence data from a corpus and analyzing the correlation between terms statistically. A new method for disambiguating the structure of compound nouns, which is a key component for term extraction, is also proposed. The automatically generated thesaurus is effectively used as a tool for exploring infonnation. A thesaurus navigator having novel functions such as term clustering, thesaurus overview, and zooming-in is proposed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Association Thesaurus for Information Retrieval

Although commonly used in both commercial and experimental information retrieval systems, thesauri have not demonstrated consistent beneets for retrieval performance, and it is diicult to construct a thesaurus automatically for large text databases. In this paper, an approach, called PhraseFinder, is proposed to construct collection-dependent association thesauri automatically using large full-...

متن کامل

Complementing WordNet with Roget's and Corpus-based Thesauri for Information Retrieval

This paper proposes a method to overcome the drawbacks of WordNet when applied to information retrieval by complementing it with Roget 's thesaurus and corpus-derived thesauri. Words and relations which are not included in WordNet can be found in the corpus-derived thesauri. Effects of polysemy can be minimized with weighting method considering all query terms and all of the thesauri. Experimen...

متن کامل

Similarity Thesauri and Cross-Language Retrieval

This paper describes a method for constructing a thesaurus automatically from a corpus of suitable documents, using standard information retrieval methods. The resulting thesauri can be used for user-initiated query expansion, automatic query expansion, as well as cross-language retrieval. Researchers at the Swiss Federal Institute of Technology in Zürich developed and evaluated this method in ...

متن کامل

Cross-Language Information Retrieval in a Multilingual Legal Domain

We describe here the application of a cross-language information retrieval technique based on similarity thesauri in the domain of Swiss law. We present the theory of similarity thesauri, which are information structures deerived from corpora, and show how they can be used for cross-language retrieval. We also discuss the collections of Swiss legal documents and show how we have used them to co...

متن کامل

Automatic processing of multilingual medical terminology: applications to thesaurus enrichment and cross-language information retrieval

OBJECTIVES We present in this article experiments on multi-language information extraction and access in the medical domain. For such applications, multilingual terminology plays a crucial role when working on specialized languages and specific domains. MATERIAL AND METHODS We propose firstly a method for enriching multilingual thesauri which extracts new terms from parallel corpora, and seco...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000